Automatically Extracting Collocations Based on Words Position Information in Corpora
نویسندگان
چکیده
منابع مشابه
Extracting Collocations from Text Corpora
A collocation is a habitual word combination. Collocational knowledge is essential for many tasks in natural language processing. We present a method for extracting collocations from text corpora. By comparison with the SUSANNE corpus, we show that both high precision and broad coverage can be achieved with our method. Finally, we describe an application of the automatically extracted collocati...
متن کاملExtracting Collocations from syntactically annotated biomedical Corpora
This thesis investigates the extraction of frequently used phrases (so called collocations) from biomedical text sources. The extraction of uninterrupted collocation candidates is introduced. For interrupted candidates, with gaps between their subcomponents, a new technique using suffix tries is developed. It is based on the iterative extension of frequent smaller patterns. This reduces computa...
متن کاملAutomatically Extracting Typical Syntactic Differences from Corpora
We develop an aggregate measure of syntactic difference for automatically finding common syntactic differences between collections of text. With the use of this measure it is possible to mine for differences between for example, the English of learners and natives, or between related dialects. If formulated in advance, hypotheses can also be tested for statistical significance. It enables us to...
متن کاملAutomatically Extracting and Representing Collocations for Language Generation
Collocational knowledge is necessary for language generation. The problem is that collocations come in a large variety of forms. They can involve two, three or more words, these words can be of different syntactic categories and they can be involved in more or less rigid ways. This leads to two main difficulties: collocational knowledge has to be acquired and it must be represented flexibly so ...
متن کاملExtracting collocations and their translations from parallel corpora
Identifying collocations in a text (e.g., break record) and correctly translating them (battre record vs. *casser record) represent key issues in machine translation, notably because of their prevalence in language and their syntactic flexibility. This article describes a method for discovering translation equivalents for collocations from parallel corpora, aimed at increasing the lexical cover...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Natural Language Processing
سال: 1998
ISSN: 1340-7619,2185-8314
DOI: 10.5715/jnlp.5.79